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The paper presents a discussion on the applicability of neural 
networks in the identification and control of dynamic systems. 
Emphasis is placed on the understanding of how the neural networks 
handle linear systems and how the new approach is related to 
conventional system identification and control methods. Extensions of 
the approach to non-linear systems are then made. The paper explains 
the fundamental concepts of neural networks in their simplest terms. 
Among the topics discussed are feedforward and recurrent networks in 
relation to the standard state-space and observer models, linear and non- 
linear auto-regressive models, linear predictors, one-step ahead control, 
and model reference adaptive control for linear and non-linear systems. 
Numerical examples are presented to illustrate the application of these 
important concepts. 


1. Introduction 

System identification and control are two related fields that have received 
considerable development in the last few decades. System identification deals with the 
problem of finding a mathematical description of a physical system from experimental data. 
Control theory devises ways to influence the system in a desirable and predictable manner. 
Typical control objectives are pointing control, vibration suppression, and tracking control. 
System identification provides the necessary mathematical model of a system for a 
particular control scheme to be designed. In turn, information gathered during the control 
process can be used to evaluate the validity of the assumed model. Existing system 
identification and control methods are based on mathematical systems theory, which first 
deals with deterministic then stochastic systems. For the most part, the systems under 
study are idealized. They are linear, time-invariant, and often assumed to be noise-free. 
When noises are present, they are assumed to be white, zero-mean, and with known 
characteristics. These assumptions are often justified because less idealized assumptions 
tend to render the analysis mathematically intractable. 
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In practice, however, all systems are affected by noises and non-linearities which 
may lead to instabilities for control laws that are based on idealized models. This motivates 
the development of a variety of control approaches which, if classified according to the 
amount of information required to design a controller, can be broadly divided in three 
classes. They are model-independent control, robust control, and adaptive control. A 
model-independent controller seeks to guarantee stability of the closed-loop system 
independently of the system model. As implied, such a design does not require that the 
system be known in advance. Robust controllers can tolerate certain specific variations 
about some nominal model, which is required to be known with somewhat accuracy, and 
variations about the nominal model need to be quantified. While a nominal model may be 
obtained analytically or experimentally, meaningful characterization of the variations about 
the nominal model is often difficult to obtain. In both model -independent control and 
robust control, there is a trade-off between stability robustness and performance. This is 
because performance requires that the system is known with certainty. If for some reason 
such knowledge is in error, then the designed optimal performance will not be achieved and 
instabilities may occur. Striking somewhat of a balance between the two control 
approaches is adaptive control which involves some level of on-line parameter estimation, 
where knowledge of the system being controlled is gained during the control process. The 
estimated parameters can be either the system parameters or the controller gains. In the 
former case, known as indirect adaptive control, the parameters representing a mathematical 
model of the system are identified on-line, and the control input is then computed. In the 
latter case, known as direct adaptive control, the system identification step is bypassed and 
the controller gains are directly updated at each time step. Adaptive control identifies the 
appropriate parameters of the system only for the purpose of control, thus offers a 
meaningful way to integrate system identification and control in one package. Adaptive 
control also offers the potential ability to handle systems with changing dynamics by 
constantly identifying and adjusting the control action accordingly. 

Recently, there has been a substantial amount of interest in the field of neural 
networks. As a collection of interconnected neurons, a multi-layer neural network with 
appropriate weights has been shown to be able to approximate any input-output function. 
Consequently, the neural network is a natural candidate in the area of identification and 
control of both linear and non-linear systems. 1 '^ The neural networks are typically 
implemented in the adaptive form, and thus possess similar attributes of adaptive control. 
The main objective of this paper is to examine the implication of the neural networks 
approach for linear systems, and to see how this approach is similar to and different from 
conventional methods. Only after a firm grasp of how the neural networks treat the linear 
problem, extensions to the non-linear problem can then be made, and the potential benefits 
of the non-linear approach can be better understood and appreciated. The extent to which 
linear approaches can handle non-linear systems can also be revealed. To this end, basic 
concepts in neural network will be presented and whenever possible, direct connection to 
existing system identification and control methods are made. Linear system identification 
techniques and adaptive control theory will be heavily drawn upon to make this 
connection. 6 - 7 In this paper, we focus on the role of the neural networks as applicable to 
structural system identification and control as opposed to other fields such as pattern 
recognition, image processing, and computer science. The recently developed 
Observer/Kalman filter identification (OKID) algorithm will also be discussed in the context 
of the neural networks. 810 Potential applicability of the existing techniques to non-linear 
problems will be examined. Several numerical examples will be used to illustrate the basic 
concepts discussed in this paper. Since accelerometers are often used in structural system 
identification and control, a direct transmission term is included in input-output models that 
are discussed in this paper. Minor adjustments can be easily made when this term is not 
present which is the case treated in Ref. 1 . 
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2. The Neural Networks 


A neural network is simply a set of interconnected individual units called neurons. 
Depending on the connection between the neurons, there are two basic types of networks 
known as the multi-layer feedforward networks and the recurrent networks, which will be 
described in this section. 

2.1 The Neuron. As a basic building block for a neural network, an individual 
neuron has a finite number of scalar inputs and one scalar output. Associated with each 
input is a scalar weighting value. The input signals are weighted by these values and added 
at the summation junction. The combined signal is then passed through an activation 
function producing the output signal. The activation function y(x) can take a variety of 
forms, the most common one is a sigmoid function denoted by sigm(x ), 

sigm(x) = \ - (1) 

\ + e~ x 

A plot of the sigmoid function is shown in Fig. 1 below. Generally, the activation function 
can be any non-decreasing differentiable function which has finite limits at both ends as 
shown in Fig. 1 below. 



Let r inputs of a neuron be denoted by u u u 2 , ..., u r and the output denoted by y. 
Let the r weights be denoted by w 2 , ..., w r . The output of the neuron can be 
expressed mathematically as 


y=y(x), x = '£w i u i 

i=l 


( 2 ) 


The neuron is shown schematically in Fig. 2 below with the sigmoid function as the 
activation function. For simplicity of notations, the network weights for the <-th layer may 
sometimes be presented collectively as Wj = {w' 1 ,W 2 ,vv 3 ,...}. 


3 





Figure 2: Schematic diagram of a single neuron. 


Remark 2.1.1. The activation function is a limiter that bounds the incoming 
signal, which serves as a non-linear element in a neuron. The activation function given in 
Eq. (1) has a linear range about the origin, and is bounded between -1 and 1 . If output of 
a neuron is bounded between -or and +a by taking y(x) = asigm(x) as its activation 
function then Eq. (2) becomes 


y = Y\ 


X WiU ‘ - a si 8 m \ X w ‘ u ‘ }~ a y 




. <=i 


(3) 


which is the same as the output of a neuron with the original activation function multiplied 
by a constant gain or. Therefore, the activation function can be taken to be between -1 and 
1 provided an additional factor is inserted after the neuron. In a network, this factor is 
absorbed into the weights of the following neurons that directly receive the output of this 
neuron as their inputs. 

Remark 2.1.2. If the activation function is a linear function, y(x) = x, then the 
neuron is a linear neuron. The input-output relationship of a linear neuron is 



(4) 


Equation (4) simply says that the output signal is a weighted (linear) combination of the 
input signals with the weighting coefficients being the weights of the neuron. 

2.2 Multi-Layer Feedforward Neural Network. A multi-layer feedforward 
neural network consists of an input layer, a number of hidden layers, and an output layer. 
In a fully connected feedforward network, every neuron in each layer accepts as its inputs 
all signals coming from all neurons in the layer immediately preceding it (see Fig. 3). In a 
partially connected network, some of these connections are missing. This is equivalent to 
setting the corresponding network weights to zero. Figure 3 show a typical three-layer 
three-input three-output feedforward network with two hidden layers. 


VTj 1V 2 W 3 



Figure 3: A three-layer three-input three-output feedforward neural network. 
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2.3 Recurrent Network. A feedforward network with time delay feedback 
elements is called a recurrent network. The delay elements take the outputs of certain 
neurons in the network, delay them for a certain number of time steps, and feed back as 
input to the neurons. In other words, in a recurrent network, time delayed outputs of a 
certain number of neurons are the inputs to other neurons. A special one layer network 
where the delayed outputs of the neurons are fed back as inputs to themselves is called a 
Hopfield network (see Fig. 4). 


W 



Figure 4: A three-input three-output Hopfield network. 

Remark 2.3.1. Unlike a feedforward network, a recurrent network contains self- 
propagation dynamics. Due to the feedback mechanism, starting with some non-zero initial 
conditions, the output values of a recurrent network will evolve over time, thus simulating 
a dynamic system. 

Remark 2.3.2. When compared to a recurrent network, a feedforward network is 
a static in the sense that it simply accepts a set of input values (or input pattern) and 
produces a set of output values (or output pattern) without any self-propagation 
mechanism. Thus it may seem that a recurrent network is preferred for modelling dynamic 
systems. In fact, the two types of networks simply represent two different ways of 
modelling a dynamic system. In identification, if a set of input-output data is already 
available, then the weights of a recurrent network can be identified by a feedforward 
network where the time-delayed outputs are treated as inputs. This seems a bit confusing, 
but should later be obvious when connections between these types of neural networks to 
standard ways of representing linear systems are made in the next section. 

3. Neural Network Representation of Linear Systems 

For each neuron, the only element that is non-linear is the activation function. If the 
activation is taken to be a linear function, y{x) = x, then the network becomes linear. This 
is true no matter how complicated the neural network is. In this section, attention will be 
focused on linear networks. The relationship between these networks and conventional 
ways to represent linear systems will be discussed. 

3.1 A Multi-Layer Feedforward Linear Network. Consider the 
feedforward network shown in Figure 3, with the activation function being a linear 
function. Furthermore, it is instructive to examine a simple case of a two-layer three-input 
one-output network shown in Figure 4 below. 
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Figure 5: A two-layer three-input one-output feedforward network of linear neurons 

The network weights between the individual connections are shown in the figure. 
Since the neurons are linear, each neuron is represented by a summation junction and the 
activation being a linear function is omitted. The output of the network in Fig. 5 is simply. 


y = w- 7 ( vv,Wi + w 2 u 2 + w 3 m 3 ) + w’*(w 4 u, + w 5 u 2 + vv 6 u 3 ) 

= ( W 7 W] + W 8 W 4 )W| + (w 7 w 2 + W 8 W 5 )u 2 + (w 7 VV 3 + W 8 W 6 )«3 
= vv,u, + w 2 u 2 + iv 3 «3 (5) 

which is the output of a single neuron with the following weights: 

Vvj = W 7 W) + w 8 w 4 

w 2 s w 7 w 2 + w 8 w 5 (6) 

iv-j = W7W3 + w 8 w 6 


The above example can be immediately generalized to show that a single-output 
multi-layer neural network with linear neurons is equivalent to a single linear neuron with 
appropriate weights. The following remarks can be immediately made. 

Remark 3.1.1. A multi-layer feedforward network of linear neurons is simply an 
over-parameterized set of linear equations where the over-parameterization takes the form 
of the type shown in Eqs. (6). This form is non-linear in the parameters. Thus the 
problem of determining these parameters from known input-output data is a non-linear 
parameter estimation problem even if the network is linear. 

Remark 3.1.2. Since any single-output feedforward network of linear neurons is 
equivalent to a particular single linear neuron, there is no benefit in using an over- 
parameterized multi-layer linear network for linear system identification. In fact, in such an 
over-parameterized model, the network weights cannot be uniquely determined from input- 
output data. This is obvious, for example, in the case of the network shown in Fig. 5. 
The set of three equations in (6) contains eight unknowns. Thus the use of a complicated 
multi-layer linear network for linear system identification is therefore neither advantageous 
nor necessary. The same is true for the case of using a non-linear network to identify a 
linear system. 

Remark 3.1.3. For a multi-output system, a multi-layer feedforward network of 
linear neurons is equivalent to a collection of single neurons arranged in parallel, each of 
which sharing the same number of inputs. In other words, any multi-output multi-layer 
feedforward network of linear neurons has an equivalent multi-output single-layer network 
representation. 
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3.2. Feedforward Linear Network and the State Space Model. This 
section describes the relationship between the feedforward linear network and the state 
space model which is a common form of representing linear systems. The discrete-time 
state space model of an n-th order, m-input, ^-output system is a set of n simultaneous first 
order difference equations of the form 

x(k + \) = Ax(k) + Bu(k) 

y(k) = Cx(k) + Du(k) (7) 

where the dimensions of A, B, C, and D are nxn, nxm, qxn, and qxm, respectively. 
Solving for the output y(k) in terms of the previous inputs yields 

k 

y(*) = £/t,u(fc-0 (8) 

i=0 


where the parameters. 


ho = D , h k = CA k ~'B , k = 1, 2, 3, ... (9) 

are the Markov parameters of the system described by Eqs. (7), which are also the system 
pulse response samples. The Markov parameters are expressed in terms of the system 
discrete state space matrices A, B, C, D. Since the state vector is coordinate-dependent, the 
state space matrices are not unique for a given system but the Markov parameters are 
unique. Let the state vector be transformed by a coordinate transformation T, z(k) = Tx(k), 
then the relationship between u(k) and y(k) via a new state vector z{k) can be described by 
a new state space representation TAT~\ TB, CT~\ D. The system Markov parameters 
computed using the new state space matrices are the same as before, i.e., 

h k = CT-'{TAT-') k ~'TB = CA k -'B (10) 

For an asymptotically stable systems, the pulse response can be neglected after a finite 
number of time steps, say p s . The input-output description in Eq. (8) can be approximated 
by a finite number of Markov parameters 

y(k) ~ hi)U{k) + h\u{k -\) + h 2 u(k -2) + ••• +h Pt u(k-p s ) (11) 

where p s is sufficiently large such that CA k B = 0, k> p s . Comparing Eq. (11) with the 
structure of the linear neurons immediately leads to the following remarks. 

Remark 3.2.1. The elements of the Markov parameters are simply the weights of a 
single-layer linear network where inputs to the network include both current and past 
values of the input signal. Note that the time delayed inputs do not affect the neuron 
configuration because they are feedforward signals and thus can be treated as separate input 
channels. This case is shown in Fig. 6. 
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Figure 6: Representation of linear systems by a feedforward network 
with the system Markov parameters as network weights. 

Remark 3.2.2. Since a general multi-layer feedforward network of linear neurons 
is equivalent to a single-layer network, the relationship between the weights of the multi- 
layer network which represents a linear system and the Markov parameters of its equivalent 
state space model is also immediately obvious. For example, if the network shown in Fig. 
5 represents some linear system with three non-zero Markov parameters, then 

w, = + vvjpwfi = 

w 2 = w<jM> + W21V22 = ( 12 ) 

w 3 = vvtfM' 5 + = h 2 

provided that u, = u(k), u 2 = u(k - 1), u 2 = u(k - 2). 

Remark 3.2.3. In practice, if the system is lightly damped, a large number of 
system Markov parameters is needed to maintain Eq. (11) a valid approximation. This 
implies that the equivalent network representing the same system has a large number of 
input channels containing distant past input values, not a large number of hidden layers. In 
other words, it is not possible to represent such a system by simply adding extra neurons 
or extra hidden layers in the feedforward network. The fact that a large number of system 
Markov parameters is required to represent a lightly damped system of the form in Eq. (11) 
is a major weakness of the representation. The same can be said for the equivalent neural 
network representation. 

3.2. Recurrent Linear Network and the Observer Model. This section 
shows the connection between the recurrent network and an observer of the system. 
Adding and subtracting the term My(i) to the right hand side of the state equation in Eq. (7) 
yields 


x(k + 1) = Ax(k) + Bu(k) + My(k) - My(k ) 

= (A + MC)x(k) + {B + MD)u(k) - My(k) 


03) 
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If M is a matrix such that A + MC is deadbeat of order p, i.e.. 


(A + MC) k = 0, k> p (14) 

then for k > p, the output y(k) can be expressed as 

y(k)=a l y(k-\)+ ••• + a p y(k- p) + p 0 u(k) + A«(£-l)+ ••• +p„u(k-p ) (15) 


where 

a k =-C{A + MC) k ~'M 

p k = C(A + MC) k '\B + MD) , p 0 = ho = D (16) 

The matrix M in the above development can be interpreted as an observer gain. The 
system considered in Eqs. (7) has an observer of the form 

x{k + 1) = Ax(k) + Bu(k ) - M\y(k) - y(&)l 

(17) 

y(k) = Cx(k)+ Du(k ) 

Besides the effect of noises, y(k) may differ from y(k) if the actual initial condition jt(0) is 
not known and some different initial condition is assumed for i(O). Defining the state 
estimation error e{k) = x(k)-x(k), the equation that governs e(k) is 

e(k + l) = (A + MC)e(k) (18) 


For an observable system, the matrix M exists such that the eigenvalues of A + MC may be 
placed in any desired (symmetric) configuration. If the matrix M is such that A + MC is 
asymptotically stable, then the estimated state x(k) tends to the true state x(k) as k tends to 
infinity for any initial difference between the assumed observer state and the actual system 
state. The matrix M can therefore be interpreted as an observer gain. The parameters 
defined as 


Y (k) = C(A + MC) k '[B + MD , - M] 

= [Pk , a k ] 


(19) 


are the Markov parameters of an observer system, hence they are referred to as observer 
Markov parameters. Like the system Markov parameters, the observer Markov 
parameters are also invariant with respect to a coordinate transformation of the state vector. 
To see this, again let the state vector be transformed by a coordinate transformation T, 
z(k) = Tx(k), then the observer is described by a new state space representation 
TAT~\ TB, CT~\ D and a new observer gain TM. The observer Markov parameters 
computed using these new state space matrices and the new observer gain are the same as 
before, 
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( 20 ) 


Y (k) = Cr- 1 (TAT-' + TMCT-' )* _1 [TB + TMD , - TM] 

= C(A + MC) k -'[B + MD ,-M], * = 1, 2, 3, ... 

Notice that in Eq. (15), the output y{k ) is the open-loop response of the system, yet 
the coefficients a k , f3 k are related to an observer gain. Consider the special case where M 
is a deadbeat observer gain where all eigenvalues of A + MC are zero, the observer Markov 
parameters will become identically zero after a finite number of terms. For lightly damped 
structures, this means that the system can be described by a reduced number of observer 
Markov parameters Y (k) instead of an otherwise large number of the usual system Markov 
parameters Y(k). For this reason, the observer Markov parameters are important in linear 
system identification. By examining of the structure of Eq. (15), the following remarks 
can be made. 

Remark 3.3.1. The input-output equation given in Eq. (15) can be represented by 
a recurrent network with a single layer of linear neurons. The number of neurons is equal 
to the number of outputs of the system. The inputs to the neurons consists of both the 
feedforward time-delayed input signals and the feedback time-delayed output signals. 
Figure 7 shows the configuration of such a network for a single-output system. 



Figure 7: Representation of a single-output system by a recurrent network 

Remark 3.3.2. The recurrent network weights are precisely the elements of the 
observer Markov parameters. The relationship between the weights of a recurrent network 
and an equivalent feedforward network is the same as that between the observer Markov 
parameters and the system Markov parameters. It can be shown that the system Markov 
parameters or the feedforward network weights are related to the recurrent network weights 
by 

k 

hk =j3* + X«A-, 


1 0 


( 21 ) 


where a k =0, /?* = 0 for k > p. To describe a system of order n, the number of observer 
Markov parameters p must be such that qp^-n where q is the number of outputs. 
Furthermore, the maximum order of a system that can be described with p observer 
Markov parameters is qp . 9 The implication of this result to the network configuration is 
that a recurrent network generally requires fewer number of parameters (or weights) than 
that required by an equivalent feedforward network. The two equivalent networks, 
however, have the same number of neurons. The minimum number of recurrent network 
weight matrices that can describe the system is p min , which is the smallest value of p such 
that qPnun ^ n. 

Remark 3.3.3. As mentioned previously, to represent lightly damped structures, 
the feedforward representation requires a large number of weights. Furthermore, it is not 
possible to represent a marginally stable or unstable system by a feedforward network. 
However, it is possible to represent such a system by a recurrent network. The implication 
of this fact for the system identification problem will be discussed further in later sections. 

4. Identification of Linear Systems using Neural Networks 

It has been shown that a general network of linear neurons is equivalent to a single 
neuron with appropriate weights. The problem of linear system identification using neural 
network is therefore reduced to finding these network weights from input-output data. The 
computation may be done off-line or on-line. In off-line computation, the input-output data 
is already available and a network representing the system is to be determined. On-line 
computation refers to the case where the network weights are continually updated as data is 
made available. 

4.1. Parallel vs. Series-Parallel Identification Models. In previous 
consideration, it appears that the recurrent network is more advantageous in representing 
certain systems than the feedforward network. To identify the recurrent network weights 
one can simply use the feedforward network configuration with actual delayed system 
outputs appeared as inputs to the feedforward network. Consider two identification models 
shown in Figs. 8 and 9 below, which are known as parallel and series-parallel 
identification model. The block denoted by D represents the time delay elements. 



Figure 8: Identification using parallel model. 






Figure 9: Identification using series-parallel model. 

The basic difference between the two schemes is that in the parallel identification 
model, the estimated output y(k) is computed based on the model own previous (estimated) 
values whereas in the series-parallel model, it is based on actual output values. 
Mathematically, in the parallel model, the purpose of the identification is to obtain the 
estimates a k , of the coefficients or* , that minimize the estimation error, e(k) = 

y(k) - y(k), where the estimated output y(k) is computed from 


y(k) = a,y(*-l)+ +a p y(k - p) + p 0 u(k) + PMk-\)+ ••• +f} p u(k-p) (22) 
In the series-parallel model, however, the estimated output is computed from 


y(k)=a l y{k- 1)+ ••• + a p y{k - p) + p () u(k) + piu(k - \)+ ••• +p p u(k-p ) (23) 

The difference between the two above equations is a subtle but important one. As 
discussed in the previous section, the estimated output of the model in Eq. (22) is the 
estimated open-loop prediction even though the coefficients of the model are related to an 
observer. On the other hand, the estimated output of the model in Eq. (23) is that of an 
observer. To see this, substitute the expression for y(k) to the estimated state equation in 
(17) produces 


x(k + 1) = (A + MC)x(k) + {B + MD)u(k) - My(k) (24) 

Since y(k) = Cx(k) + Du(k), one can obtain Eq. (23) assuming zero initial conditions for 
the observer. Therefore, y(k) in Eq. (23) represents the estimated output provided by the 
observer. The estimation error e{k) is the difference between the actual output and the 
estimated output provided by the observer. On the other hand, if the actual response y(k) is 
replaced by the estimated value y(k) in Eq. (23) then the terms involving the observer gain 
M cancel each other identically for any arbitrary initial condition Jc(0), 


x(k + 1) = (A + MC)x(k) + (B + MD)u(k)- My(k) 
= Ax(k) + Bu(k) 






Therefore, there is no longer any observer involved in the equation; x(k) now plays the 
role of the state vector x(k ) as in Eq. (7) and the estimated output y(k) = Cx(k) + Du(k) is 
the same as that produced by the open-loop model provided that the initial conditions for 
x(k) and x{k) are identical. The quantity y(k ) now represents the predicted output 
provided by the open-loop model alone, which is referred in this paper as open-loop 
prediction. In this case, the error e(k) is the difference between the actual output and the 
predicted output provided by the identified open-loop model. 

Remark 4.1.1. First recall that the model structure in Eq. (15) subsumes an 
observer. If the parallel identification model is used in conjunction with the model structure 
of Eq. (15) then the prediction error that drives the parameter estimation scheme is simply 
the open-loop prediction error not the observer (output) estimation error. Consequently, 
the observer portion of the model cannot be identified. This fact accounts for the 
difficulties encountered in parallel model identification, namely, the conditions for which 
the scheme will converge are presently not known. 

Remark 4.1.2. In the series-parallel identification model, since the actual instead of 
(open-loop) predicted output enters the model, a feedforward network with delayed input 
and actual output measurements can be used to identify the system. This consideration 
eliminates the use of a recurrent network which would introduce additional but unnecessary 
difficulties to the system identification problem, (see Fig. 10). Each output of the system is 
represented by a single linear neuron. A multiple-output system is represented by a single 
layer of neurons. The identified network can be used either as a feedforward or a recurrent 
network. In the former case, the network provides estimation of the response by an 
observer. In the latter case, it is an open-loop predictor. Again, this depends on whether 
actual or predicted output is used in computing the response. 



Figure 10: Feedforward representation of the series-parallel identification model by a 

single neuron for each output. 

4.2. Identification of the Network Weights. This section shows how the 
weights of the network represented by Eq. (15) can be computed using a feedforward 
model. For linear systems, it is sufficient to use a one layer network having as many 
neurons as the number of outputs. This is a simple linear parameter estimation problem. 
The off-line computation is shown first, followed by an equivalent on-line computation. 


For simplicity, consider the case where the system starts from zero initial conditions. 
Equation (15) can be written as 


y(*) = £[A , «,] 


'u(k - i ) 
J(k-i) 


+ Po u(k) 


(26) 


where network weight matrices /I, , a, are defined in Eq. (16). Writing Eq. (26) in matrix 
form for a set of input-output data N + 1 samples long yields 

y = YV (27) 


where 

y = [y(0) y(l) - yip ) y(p + 1) - y(W)] (28) 


Y = [j3«, A> <X\* ft. •••. ft, « P ] 


'u( 0) 

u( 1) 


u{p) 


u(p + 1) ••• 

u(N) 
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'u(N-\y 
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.yip). 
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_y(N-l)_ 
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«(1) 


'u(N-p)' 






,y( o). 


_y (i). 


jiN-py 



The network weight matrices are estimated using the equation 

Y = yV + 


(29) 


(30) 


(31) 


where (.) + denotes the pseudo-inverse of the quantity in the parentheses. If the initial 
conditions are not zero then a slightly different equation must be used to solve for the 
network weights, that is 


Y = y,V, + ( 32 > 

where y, and V, are obtained by deleting the first p columns in y and V, respectively. 

Remark 4.2.1. The least-squares solution in Eq. (31) or (32) minimizes the error 
between the actual output and the estimated output computed using the actual input and 
output data, i.e., thejeast-sqyares solution minimizes the residual e = y - y where y is 
computed from y = YV and Y is given in Ec^. (31). If E(}. (3£) is used instead, then the 
least-squares solution minimizes e, = y-y, where y<=YV ( . This computation, 
therefore, corresponds to the series-parallel identification scheme that minimizes the 
observer estimation error. 

Remark 4.2.2. Ideally, the error between the actual output and the predicted output 
provided by the identified open-loop model is the proper error to be minimized for the 
identification of the system open-loop model. The above computation minimizes the 
observer estimation error instead. For a linear system, it turns out that in the absence of 



noises the open-loop system can be identified exactly by minimizing the observer 
estimation error. In the presence of noises, however, minimizing the observer estimation 
error does not necessarily implies that the open-loop prediction error is minimized. 
Therefore, it is possible that the observer model fits the data well but the open-loop model 
does not. Fortunately, if the order of the regression equation is chosen to be sufficiently 
large then simultaneous observer and system identification will still be achieved in the limit 
as the data record tends to infinity and the noises are white, Gaussian, and zero-mean (see 
Ref. 9). 

Remark 4.2.3. The least-squares solution in Eq. (31) can be obtained by an on- 
line parameter estimation scheme. First, write each column in V as 

v=[r(0), r(i), r(2), ... ] ( 33) 


so that at each time step k, Eq. (27) can be written as 

y(k)=YV(k) 

The recursive least-squares equation for the network weights is simply, 

YOfc) = Y(* - l) + [y(*)- Y(* - l)rOt)|{ B h lMz 11 1 

I i\i + r(k) T R(k-\)r(k)\ 


(34) 


(35) 


/?(*) = /?(£-!)- 


/?(fc-pr(fc)r(fc) r /?(&-!) 

i + r(k) T R(k-\)r(k) 


(36) 


where Y(fc) = ^3 0 (£), P\(k), a x {k), p 2 (k), cc 2 (k), .... p p (k), «,,(£)], Y(0) is an arbitrary 
initial guess, and /?(()) is any symmetric positive definite matrix. The recursive equations 
for (32) are analogous. 

Remark 4.2.4. In a multi-layer neural network, the back propagation algorithm is 
typically used to update the network weights recursively. This is a gradient-based 
parameter update algorithm. When expressed in the block diagram form for hardware 
implementation, the algorithm resembles the forward network except that the signal travels 
in the opposite direction, leading to the name back propagation. In the present case, 
because the network is simply one layer of linear neurons, it is not efficient to use the back 
propagation algorithm to compute the network weights. For on-line implementation, the 
least squares algorithm given in Eqs. (35-36) or its variants for fast computation are 
preferred. 

5. Predictor Models For Linear Dynamic Systems 


In theory, the open-loop model can be used to predict the system response based on 
current and past input values. However, this is not desirable in practice because of the 
requirement that both the open-loop model and the initial conditions be known exactly. 
Such a prediction is also sensitive to noises. On the other hand, an observer which is 
typically used to estimate the system state based on actual input-output data can also be 
used to provide an estimate of the system output. This section first discusses the use of an 
observer as an one-step ahead predictor. This interpretation is important because of its 
connection to the control problem which will be discussed in later sections. Extensions to 
the identification and use of multiple-step ahead predictors will then be made. For linear 
systems, these predictors are simply special single-layer networks of linear neurons. 
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5.1. One-Step Ahead Predictor. To express explicitly the observer as an 
one-step ahead predictor, one simply writes the observer equations as 


x(k + 1) = ( A + MC)x(k) + (B + MD)u(k) - My(k) 
y(k + 1) = Cx(k + 1) + Du(k + 1) 


(37) 


As a predictor, the interested quantity is y(k + 1). One can therefore bypass the state 
equadon by writing 

y(Jfc + l) = aiy(£)+ ••• +a p y(k- p + \)+p 0 u(k + \)+Piu(k)+ ••• +p,u(k- p + 1) (38) 

The following remarks can be made regarding the forms of Eq. (37) and Eq. (38). 

Remark 5.1.1. In theory, if the state space model (A, B, C, D) is known exactly 
then one can design an observer gain M such that A + MC is asymptotically stable. To use 
Eq. (38) as an output predictor, one need to include a sufficient number of terms such that 
(A + MC) 1 is negligible for i £ p. The state space representation in Eq. (37) is a better 
choice since it involves no such approximation. The above comment no longer holds true 
if M is such that A + MC is deadbeat, i.e., (A + MC)' s 0, i £ p since the approximation 
becomes exact in this case. 

Remark 5.1.2. In practice, the system model cannot be known exactly. To 
identify the system from input-output data using the series-parallel structure, one in fact 
computes directly the coefficients in Eq. (38) rather than the state space matrices. To obtain 
a minimal order state space representation from these coefficients, realization is required. 
As an output predictor, therefore, Eq. (38) should be used directly because conversion to a 
state space representation is not necessary for this purpose. 

Remark 5.1.3. Equation (38) clearly indicates that the one-step ahead predictor 
takes the form of a single layer network of linear neurons with actual input and output 
signals entering the network, and the output of the network represents the one-step ahead 
prediction. Schematically, this is the same as shown in Fig. 10. 

5.2. A Two-Step Ahead Predictor. This section derives the equations for a 
two-step ahead predictor for linear systems, and shows that it also has a linear neural 
network form. First, from Eq. (7), one can write 

x(k + 2) = Ax(k + \)+Bu(k + l) 

= A 2 x(k)+ABu(k) + Bu{k + \) (39) 

y(k + 2) = Cx(k + 2) + Du{k + 2) 

Adding and subtracting the term Gy(i) to the right hand side of the state equation yields 
x(k + 2) = A 2 x(k) + ABu(k) + Bu(k + 1) + Gy(k) - Gy(k) 


= (A 2 + GC)x(k) + [AB + GD , B] -Gy(*) 

L w (* + 1 ). 


( 40 ) 


If G is a matrix such that A 2 + GC is deadbeat of order p, i.e., 



(A 2 + GC)*=0, k> p 


(41) 


then the relationship between the input and output of the system can be described as a linear 
combination of input-output data of the form 

y(fc + 2) = g(;K£), y(k- 2), y(*-4), u(k + 2), u(k + 1), u(k), ...) (42) 

for k > 2p — 2 so that at sufficiently large time steps, terms involving the states *(0) and 
x(l) vanish, i.e., C(A 2 +GC)‘ x(0) = 0, C{A 2 +GC)‘ x{\) = 0, for i>p due to the 
imposed deadbeat condition for A 2 + GC. Furthermore, there is only a finite number of 
coefficients that make up the linear combination in g(.), which are the predictor Markov 
parameters of the form 

G(k) = C{A 2 +GC) k ~'[AB + GD, B, - G ] , * = 1, 2, ..., p (43) 

Existence of the matrix G such that ( A 2 + GC) k = 0 , k > p is assured if the pair (A 2 , C) 
is observable. 

Remark 5.2.1. The above derivation justifies the form of a two-step ahead 
predictor. In fact, one can identify the coefficients of this predictor from input-output data 
by minimizing the two-step prediction error. The procedure is similar to that discussed in 
Section 4. 

Remark 5.2.2. To obtain the two-step ahead prediction one can also propagate the 
observer, which is a one-step ahead predictor, in two successive time steps by treating the 
estimated output from the first time step as the actual output for the second time step 
However, such a procedure would amount to performing open-loop prediction in the 
second time step and is therefore sensitive to noises. On the other hand, if one uses the 
predictor form with the coefficients directly identified from input-output data then only 
actual data enter the computation and thus minimizes the errors due to noises. 

Remark 5.2.3. Again, the predictor form can be represented by a single layer of 
linear neurons, and the weights of this network are simply the elements of the predictor 
Markov parameters shown in Eq. (43). Results presented in this section can be easily 
generalized to a general multi-step predictor. The relationship between such predictors and 
the deadbeat control problem will be discussed in a later reference. 

6. Control of Linear Systems using Neural Networks 

As formulated in previous sections, linear systems can be represented by a single- 
layer network of linear neurons. The weights of this network can be identified from input- 
output data. Once identified, the network can be used as a one-step ahead predictor. This 
section discusses the use of such network directly for control application without requiring 
the state space model to be extracted from these weights. 

6.1. A One-Step Ahead Controller. First, consider the case where the 
linear system can be expressed in the form, 

y(/: + l) = a^y(k) + ••• +a p y(k - p+ \)+(3 0 u(k + 1) + /),«(&)+ ••• +/3 p u(k- p + 1) (44) 

where the coefficients are assumed to be known. Let the desired response be denoted by 
r(k). To obtain a controller directly from the above equation, one simply replaces y(k + 1) 
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by its desired value r(k+ 1) and then solve for the control input w(£+l ) to obtain the control 
law, 

u(k + \) = fc l [r(k + \)-ctiy(k)- ••• -a p y(k - p + l)- fiMk)- ••• -p„u(k- p + lj] (45) 

If the coefficients in Eq. (44) are not known exactly then Eq. (46) represents an one-step 
ahead estimate of what the system will produce based on current and past input-output data, 

y(k + \) = a i y(k)+ ••• + a p y(k- p + ]) + (5 i) u(k + l)+P l u(k)+ ••• +p p u(k - p + 1) (46) 

The control law then is simply 

u(k + 1) = /fc'|r(& + 1) - a { y(k)~ ••• -a p y{k- p + l)- p\u(k)~ ••• -fi p u{k-p+ 1)] (47) 

The behavior of the closed-loop system using the above control law can be examined as 
follows. Let e(k+ 1) denote the error between the actual response and the predicted 
response. 


y(* + l) = y(£ + l)_ <? (/fc + l) (48) 

Substituting Eq. (47) and Eq. (48) into Eq. (46) produces 

y(£ + l)-e(& + l) = diy(/0+ ••• + a p y(k- p + \) + ft { u(k)+ ••• +fi p u(k- p + 1) 

+ fa){pC>'[r(k + \)- cc.yik)- ••• -a p y(k-p + 1) 

-fiiu(k)- ••• -j) p u(k -p + l)]} 

= r(k + 1) (49) 

or 

| yOfc + l) = r(* + l) + <?(* + l) (50) 

Define the tracking error to be the difference between the actual response and the desired 
response, £(k) = y(k)~ r(k), Eq. (50) reveals that 

£(k + 1) = e(k + 1) ^ 1 ) 

Therefore, if the predictor is such that its prediction error vanishes in the limit, then the 
tracking error also vanishes in the limit, i.e., 

limc(£) = 0 => lime(fc) = 0 (52) 

fc— »«* 

Remark 6.1.1. The one-step ahead control law has the property that the tracking 
error is the same as the prediction error. The above analysis shows that accuracy of the 
predictor model governs the accuracy of the tracking response. As long as the predictor 
can perform a reasonably good one-step ahead prediction of the system response then the 
control input can be computed to make the system track a desired trajectory. In the ideal 
case where the system is linear and the data is noise-free, the prediction error and the 



tracking error will be zero identically. Non-zero prediction and tracking error can only 
come about during adaptation or when noises are present. This is different from the non- 
linear case where both the estimation and tracking error are non-zero even when there are 
no noises in the system. An important restriction of the one-step ahead controller is that the 
open-loop system is required to be stably invertible, (i.e., there are no unstable zeros in the 
linear case). If this condition is not met, it is possible to have the controller producing 
unbounded input while maintaining zero tracking error. 

Remark 6.1.2. To obtain the result in Eq. (51), the controller coefficients must be 
the same as those of the predictor model. In the event the coefficients of the predictor 
model are updated at each time step, then the controller coefficients must also match those 
of the predictor model. Mathematically, if at time step k, the predictor takes the form 

>(£ + 1)= &\(k)y(k)+ ••• +a p (k)y(k- p + 1) 

+ Po(k)u(k + \) + j3 l (k)u(k)+ ••• +P P (k)u(k-p + l) (53) 

then the control law will be taken to be 

u(k + 1) = po(k)~'[r(k + \)- a^k))^)- ••• -a p (k)y(k - p + 1) 

-fa(k)u(k) (3 p (k)u(k-p + 1)] (54) 

Remark 6.1.3. The above controller can be implemented in neural network form. 
Such a controller simply copies the weights of the feedforward predictor network to 
generate the control input. This is shown schematically in Fig. 1 1 below. 



Figure 1 1 : Adaptive one-step ahead controller. 

Remark 6.1.4. Since the controller attempts to make the system track the desired 
trajectory in one step, excessive control efforts are usually required. This makes the 
approach unattractive in practice. To alleviate this problem, the weighted one-step ahead 
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controller is used, such that at each time step the control input minimizes the following 
quadratic cost function 

J(k + 1) = )-e{k + 1 ) T Qe{k + 1) + -u(k + \) T Su(k + 1) (55) 

where the tracking error e(k + 1) = y(k + 1) - r(k + 1). The weighting matrices Q and S are 
required to be symmetric and positive definite. Substituting the expression for e(k + 1) and 
y(k + 1) into the cost function and then performing the minimization produces 

= PZQ&\y(k) + PlQa-iyik - 1) + ... + p{Qa p y(k- p + \) + PZQPou(k + \ ) 
+PZQPMk)+ - + PSQp p u(k-p + l)-p5Qr(k + \) + Su(k + \) (56) 

Setting the result to zero and solving for the control input yields 

u(k + \) = (P{Qp {) + sy'p$Q[r(k + l)-aiy(k)- ••• -a„y(k-p + 1) 

-pMk)- P P u(k-p + \)] (57) 

The above is known as a weighted one-step ahead controller in adaptive control literature. 6 

6.2. Model Reference Controller. A different way to avoid the requirement 
that the system track a desired trajectory in one step is to use a control scheme known as 
model reference control. Let the control law in Eq. (47) be modified as 

u(k + 1) = po' [r(£ + 1) - di>'(£) a p y(k ~p + 1) - fru(k) p p u(k -p + 1) 

- y x y{k) y p y(k -p + 1)] (58) 

Substituting Eq. (48) and Eq. (58) into Eq. (46) yields 

y(k + l)-e(k + \) = a\y(k)+ ••• +a p y(k- p+l) + p ] u(k)+ ••• +P P u(k-p + 1) 

+ Po{Po'[-aiy(k)- ••• - a p y(k - p + l)- p x u(k)- ••• -p„u(k- p + \) 

~Y\y(k)~ y,y(*-p + l) + r(* + l)]} 

= -Y\y(k)~ ••• -y p y(k- p+\) + r(k + \) (59) 

which can be expressed as 

y(£ + 1)+ y ] y(k)+ ••• + y p y(k- p + \) = r(k + \) + e(k + \) (60) 

The system response y(k) now no longer follows the reference input r(k) directly as in the 
case in Eq. (49). Its behavior can be conveniently interpreted in terms of a reference 
model. Define y m (k) as the response of a reference model when driven by the reference 
input r(k), 
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*.(* + 1)+ /,*,(*)+ +y p y n (k-p + l) = r(k + l) (61) 

and the tracking error e m (k ) as the difference between the system response y(k) and the 
reference model response y n (k). 


e m (k) = y(k)-y m (k) (62) 

The equation that governs the behavior of this tracking error is obtained by subtracting Eq. 
(61) from Eq. (60), 

e„(k + l) + y 1 e m (k)+ ••• + y p e m (k- p+\) = e(k + \) (63) 

Therefore, convergence of the prediction error to zero implies convergence of the tracking 
error to zero provided that the characteristic equation governing the homogeneous part of 
the difference equation is asymptotically stable, 

X p + Y\X P ~' + •••+/,= 0 (64) 

This requirement is easily satisfied since the coefficients y x , y 2 , ..., y p are the design 
variables to be selected a priori. 

Remark 6.2. 1 . The difference between this case and the previous case is that the 
desired trajectory is not specified by the reference input r(k), but rather by the response of 
the reference model. Since the reference model is known, the reference input r(k) that is 
needed to make the reference model produces the desired response can be easily computed. 
The introduction of the reference model is to slow down the convergence of the tracking 
error so that excessive correction during the adaptation process does not occur. 

Remark 6.2.2. The model reference control scheme can also be implemented in 
neural network form. At any time step, the controller network copies the coefficients of the 
predictor network and uses them in the generation of the control input. The configuration 
for this control scheme is shown in Fig. 12. 

Remark 6.2.3. Equation (63) shows that the prediction error acts as a driving term 
for the difference equation that governs the behavior of the tracking error. If the reference 
model coefficients are designed such that the homogeneous solution is asymptotically stable 
then the steady state tracking error is simply the particular solution of the difference 
equation. One thus has the ability to affect the steady state tracking error through the 
reference model coefficients. However, this freedom is constrained by the residual 
dynamics of the prediction error that the steady state tracking error may be amplified or 
reduced. Generally speaking, the natural frequencies of the reference model should be 
placed away from those dominating the residual dynamics. 

Remark 6.2.4. If the coefficients of the predictor model are updated at each time 
step, then the controller coefficients must match those of the predictor model at each time 
step. The resulting integration between parameter estimation and control computation is 
known as model reference adaptive control. The adaptive scheme is summarized in the 
following equations where the ordinary least-squares algorithm is used to perform the 
parameter estimation step. Again, let Y (k) denote the estimated coefficients of the predictor 
model at time step k. 


Y(k) = p 0 (k), pi(k), a x {k), p 2 (k), a 2 (k), .... p p (k), a p (k )] 


(65) 
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e(k) 


Figure 12: Model reference adaptive control. 


starting with Y(0) as an arbitrary initial guess. The control input is computed from 


«(* + D = Po(k)-'[-&i(k)y(k) cc p (k)y(k-p + \)-fr(k)u(k)- - 

-P P (k)u(k- p + \)~Yiy(k)~ ••• - Y P y(k -p + \) + r(k + 1)] (66) 


where the reference model coefficients y x , y 2 , ..., y p are time-invariant and chosen a 

priori. The above control input is applied to the system producing response y(jfc+l). The 
predictor coefficients are then updated according to the rule 

Y (* + 1) = Y (k) + fy(* + 1) - Y(*)T(* + 1)1 j — SSi + )) T W k } 1 (67) 

r i{\ + r(k + \) T R(k)r(k + l)} 


R(k + 1) = /?(*) - 


/?U)r(A:-H)r(il:-H) r ^) 

\ + r(k + \) T R(k)T(k + \) 


( 68 ) 


staring with R(0) as any symmetric positive definite matrix. The newly estimated 
parameters are then used to compute the control input for the next time step u(it+2). 

Remark 6.2.5. The control schemes discussed in this section deals with a one-step 
ahead predictor model of the form shown in Eq. (38). The previous section shows that a 
two-step ahead predictor or a multi-step ahead predictor has the same linear form. 
Therefore, the results presented in this section can be easily extended to these predictors. 
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For example, the two-step ahead controller will compute the control u(k + 2) requiring the 
measurements upto y(k) only. 

7. Modelling and Control of Non-Linear Systems 

Up to this point, the discussion has been restricted to linear systems. It has been 
shown that for linear systems, it is not necessary to have a complicated neural network for 
identification and control, but rather a single layer of linear neurons is adequate. This 
section extends the results to non-linear systems. The significance of linear predictor 
models for non-linear systems will be discussed. This has an important implication on the 
extent to which linear techniques can be used to handle non-linear systems. The modelling 
and control of non-linear systems using non-linear neural networks will then be examined. 

7.1. Linear Predictor Models for Non-Linear Systems. The predictor 
model derived in Section 5 is based on the open-loop state space model which is a linear 
representation. For linear systems, the identified coefficients of an one-step ahead 
predictor take a particular form, namely, the Markov parameters of an observer model 
consisting of the open-loop state space model and an observer gain. Recall that a predictor 
uses actual input and output data to compute the predicted response at each time step. To 
qualify as a valid open-loop model as well, the predictor must also accurately predicts the 
system response in an open-loop test using input data alone. As mentioned previously, for 
linear systems, the predictor is also valid as an open-loop model because it can also 
produce correct open-loop prediction. For a non-linear system, this is no longer the case. 
However, when the predicted response is modeled as a linear combination of past input and 
output data, it turns out that surprisingly good prediction can still be obtained even for non- 
linear systems. Such predictor models do not qualify as open-loop models of the actual 
system because they do not predict correct open-loop response using input data alone. This 
point will be further illustrated by a numerical example in a later section. 

7.2. Control of Non-Linear Systems Using Predictor Models. In this 
section, we show that the model reference controller considered in Section 6.2, or its 
special version, the one-step ahead controller in Section 6.1, can be used to control a class 
of non-linear systems which can be represented by linear predictors of the form considered 
in Eq. (38). Suppose that the non-linear system can be represented by an non-linear auto- 
regressive model of the form 

y{k + 1) = f{y(k), y(k- 1), ..., w(£ + l), u(k), u(k- 1), ...) (69) 

where f{.) is some non-linear function of past input and output data. First, note that for the 
response of the system to follow that of a reference model, 

y m (k + 1)+ Y\y m (k) + ••• + y p y m (k- p + \) = r(k + \) (70) 

we require that the response of the controlled system be described by 

y(k + l)+yiy(k)+ ••• + y p y(k- p + l) = r(k + l) (71) 

so that the tracking error, e m (k) = y(k)- y m (k), will be governed by 

£ n (k + \)+ Yi£ m (k)+ ••• + y p e m (k -/? + !) = 0 (72) 
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Therefore, at time step k+ 1, one wishes to determine the control input u(k+ 1) such that Eq. 
(69) is satisfied. Since the relationship between y(k+\) and u(k+ 1) is non-linear and is not 
known, one cannot solve for u(k+ 1) directly. However, if the non-linear system is such 
that there exists a predictor of the form given in Eq. (44) such that y(k + 1) * y(k + l)then 
one satisfies the following equation, 

y(k + 1) + Yiy(k) + ••• + y p y(k- p + \) = r(k + l) 

instead of Eq. (71). The control law is then determined by substituting Eq. (46) in Eq. 
(73) and then solve for u(k+ 1), producing exactly the same control law as given in Eq. 
(66). The fact that the control law satisfies Eq. (73) instead of Eq. (71) will make the 
tracking error equation governed by Eq. (63) instead of Eq. (72), where e(k+ 1) denotes the 
prediction error defined in Eq. (48). 

7.3. Identification and Control of Non>Linear Systems using Non- 
Linear Neural Networks. The identification and control scheme discussed in this 
paper can be extended to include non-linear neurons. The basic assumption is that the 
system response is a non-linear function of previous input and output data which can be 
represented by a multi-layer feedforward network having a sufficient number of non-linear 
neurons. Let the non-linear function be denoted by /(.) and its neural network 
representation by N(.), 

yik + \) = f(y(k), y(k- 1), ..., u(k + 1), u(k), u(k- 1), ...) 

= N(y(k), y(k - 1), ..., u(k + 1), u(k), u{k-\), ...) (74) 

When a non-linear network of sufficiently large number of hidden layers is used, then it 
may also qualify as an open-loop model of the non-linear system besides its being an one- 
step head predictor. This is the fundamental difference between identification using a linear 
network versus a non-linear network. Generally speaking, the theoretical advantage of 
using a non-linear network for non-linear system identification is off-set by the difficulties 
in finding such a network in practice. Neither the number of hidden layers nor the number 
of neurons in each layer are known a priori. For a chosen network configuration, the back 
propagation algorithm is often used to determine the network weights. Typically, the 
convergence rate is slow and a large amount of data is needed. The back propagation 
algorithm is well-known and discussed extensively in the literature. 

In the model reference control problem, the theoretical advantage of a non-linear 
network is somewhat diminished because the open-loop model need not be found for 
purpose of tracking control. The model reference control scheme can accommodate a non- 
linear network rather easily. Assume that the network representing the non-linear system 
can be expressed in the form, 

y(k + l)=N{y{k), y(k - 1), ..., u(k + 1), u(k), u(k- 1), ...) 

-Ni (y(k), y(k- 1) u(k), u(k- 1), ...) + /J 0 (*)u(* + 1) + <?,(* + 1) (75) 

where e^k + 1) denotes the fitting error introduced with the separation of the u(k+ 1) term 
from A/(.). The control input is computed from 

u{k + \) = p 0 (k)-'[r{k + \)~Yiy(k)- r,)>(*-p-l)-/V,(.)] (76) 
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where Yu 7 P are the coefficients of the reference model representing the desired 

response. The control input when applied to the system yields the closed-loop response, 

y(k + 1) = Ni (.) + p 0 (k){po(k)-' [r(k + 1) - y,y(*) y p y(k - p - 1) - N, (.)]} + e x (k + 1) 

= N l (.) + r(k + \)- Y >y(k) Ypyik-p-D-NM + eM + l) 

= r(k + 1) - Yiyik) Y P y(k - p - 1) + e x (k + 1) (77) 

The tracking error, £ m (k) = y(k) - y„(k), where y m (k) is the response of the reference 
model is governed by the difference equation 

e m (k + \)+ Yi£ m (k)+ ••• +Y P £m(k- p + \) = e y (k + \) (78) 

* 

In practice, one identifies an approximation of N,(.) denoted by Ni(.). The control law is 
then based on M(-)> 

u(k + 1) = Po(k)-'[y(k + 1) - yy(k) y,y(* - p - 1) - M(.)] (79) 

The closed-loop system becomes 

y(k + 1) = A/, (.) + r(* + 1) - Yiy(k) Y P y(k - p - 1 ) - N x (.) + e, (k + 1) (80) 

Let e 2 (k + 1) denote the approximation error, e 2 (k + 1) = A/](.) - N ](.). The tracking error is 
now governed by 

e m (k + 1)+ Y\ £m(k)+ ••• + Yp £ m(k - p + 1) = e\(k + 1) + e 2 (k + 1) (81) 

A schematic diagram of the control scheme is the same as shown in Fig. 12, except that the 
block representing the identification model is now a non-linear neural network. 

8. Numerical Examples 

In this section, several examples will be presented to illustrate various concepts 
discussed in this paper. The case of a linear system is considered first, followed by a non- 
linear system. Both identification and control aspects of each case will be shown. 

8.1. Network Representation of a Linear System. Consider a linear 
single-input single-output system with three vibration modes at 0.40Hz, 1.37Hz, and 
2.21Hz, each with a damping factor of 0.5%. The state space matrices shown represent a 
discrete model at a sampling rate of 10 Hz. 
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C = f 1.0 — 0;5 0.0 1.0 0.5 0.0], D = 1.5 

The system is excited by random input shown in Fig. 13 producing the response shown in 
Fig. 14. 



Figure 13: Excitation input time history. 
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3 

Output 0 
-3 
-6 
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Figure 14: System response time history. 



Using the above time histories, the network weights can be identified using Eq. (31). 
First, consider the case where p = 6, the following values for the network weights are 
obtained: 


a, =8.02x10-', a 2 = -4.59xl0 -3 , d 3 = -2.31 xlO" 2 
a A =4.11x10-*, d 5 =2.02x10-', a 6 = -7.24x10'' 

/)o = 1.50, 0i = -1.01, 0 2 =-5.48x10^, 03 = 1.59x10-' 

0 4 =-5.02x10-', 0s =-4.27x10-', 0 6 = 8.75x10'' (83) 

The above results are checked against the data by performing an open-loop prediction of the 
response using the input alone, 

y(k) = ociyik - 1)+ ••• +GC(,y(k -6) + 0 o w(&) + p)U(k - 1)+ +fi p u(k- 6) (84) 

and an one-step ahead prediction (or observer estimation) using both actual input and 
output data, 

y(k) = a i y(k - 1)+ ••• +a 6 y(k-6) + 0 o w(fc) + 0itt(£-l)+ ••• +0 p u(fc- 6) (85) 

It can be verified that in both cases, both predicted responses match the actual data exactly. 
Again, it should be emphasized that the result shown in Eqs. (83) represents a set of 
weights that can be identified from any feedforward network that uses 6 past values of 
input and output data to predict the current response. Specifically, if one uses a network 
consisting of a single neuron, then the values listed in Eqs. (83) are precisely the weights 
of this neuron. On the other hand, if a feedforward network consisting of several layers of 
linear neurons is used to identify the system, then the values in Eqs. (83) are the weights of 
a single neuron representation that is mathematically equivalent to the multi-layer network. 

The system in Eqs. (82) in fact contains one uncontrollable mode as revealed by the 
singular values of the controllability matrix, C = [A 5 fi,A 4 fi,..., AB,B], 


26 





o x = 1.08, <72 = 5.92x10-', <73 = 3.69x10-', <7 4 = 2.19x10-' 
cr 5 =9. 44x1 O' 17 , <7 6 = 1.91xl0-' 7 


( 86 ) 


The model in Eq. (84) is therefore an over-parameterized model. The same system can be 
modeled by using data from only 4 past time steps to predict the current response, i.e., p = 
4. The corresponding weights are given below: 


a, = 2.29, a 2 = -2.67, a 3 = 2.26, oc 4 = -9.84 x 10-' 
A, = 1.50, A =-3.24, A = 3.72, A = -2.98, A = U9 


(87) 


Note that the over-parameterization in Eq. (84) is in the form of having more distant past 
input and output data to predict the current response, corresponding to the case of a neuron 
having additional input channels. This is in contrast to the case where over-parameterization 
is in the form of having additional neurons added to the network. 

8.2. Model Reference Adaptive Control of A Linear System. Next, we 
consider the application of the model reference adaptive control of the above system. The 
goal is to have the system track a desired trajectory prescribed via the reference model, 

y m (k + 1) = 0. 4y m (k) + 0. 5y m (k - 1) — 0. 3y m (k - 2) + r(k + 1) (88) 


where r(k) = sm(k/2n). First, consider the ideal case where disturbance and noises are 
not present. Since the system has a single output, the predictor network consists of only 
one linear neuron. In this example, 6 past input and output values are used to predict the 
current response. Recall that this is a case of over-parameterization since the effective order 
of the system is only 4. The system is assumed to be unknown to the controller at the 
beginning, and the weights are initially set to zero. Simultaneous prediction and control is 
carried out producing the results shown in Figs. 15a-d below. Figure 15a shows that the 
system response (dashed curve) quickly tracks the desired response (solid curve). The 
time histories of the prediction error and of the tracking error during the process are shown 
in Fig 15b and 15c, respectively. The control input time history is shown in Fig. 15d 
revealing that the adaptive mechanism quickly produces the necessary control input to make 
the system track the desired response. 



Figure 15a: Tracking response. 



Figure 15b: Prediction error. 
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Figure 15c: Tracking error. 


Figure 15b: Control input. 


Figures 16 a-c show the adaptation when a disturbance, d(£) = 0.5cos(&/2;r), and 
5 % measurement noise are added to the system. With the same adaptive controller, the 
system continues to track the desired response as shown in Fig. 16a. The effect of the 
noises can be seen in the random variation in the prediction error and the tracking error time 
histories. Figs. 16b and 16c. The new control input history that makes the system track the 
desired response and accommodate this disturbance is shown in Fig. 16d. 




Figure 16a: Tracking response with 
disturbance and noise present. 


Figure 16b: Prediction error. 
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Figure 16c: Tracking error. 


Figure 16b: Control input. 


8.3. Identification and Prediction of a Non-Linear System. While it is 
not possible to have a linear model that can reproduce the open-loop response of a non- 
linear system, it is possible to have a linear predictor that can reasonably predict the non- 
linear response. The predictor model uses actual input and output data to compute the 
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predicted response. Consider the system whose state space matrices are shown previously, 
but the input and output are related by the following non-linear relationship, 

x(k + 1) = A f[x(k )] + Bu(k) 

y(k) = g[x(k)] + Du(k) (89) 

where the non-linear functions /[*(£)] = |x(fc)| 1/2 sg/i[.x(&)], #[.*(£)] = sin[Cx(/:)] operate 
on each element of the state vector x(k). Note that in this example, the non-linearity affect 
takes place from one sampling interval to the next. For purpose of identification, the 
system is excited by a random input sequence and the resulting response is used in the 
series-parallel identification scheme. For the case p = 6, the following model coefficients 
are identified: 


a, =1.35, d 2 =-5.40x10-', d 3 = 1.39x10'' 
a 4 =-5.95x 10" 2 , d 5 =1.33x10-', a 6 = -1.62x10-' 


A, = 1.48, A =-1-89, p 2 = 7.41x10-', A=-8.74xl0- 2 

A, = 9.58 x 10- 2 , A = -2.38 x 10"', A = 1.93 x 10 ' (90) 

Recall that the identified model can be used either as an open-loop model or an one-step 
ahead predictor. The model is checked against the response of the actual system to a sine- 
wave input excitation, u(k ) = sin(0.27r/:). This is shown in Fig. 17 where the solid curve 
is the actual response of the non-linear system and the dashed curve is the open-loop 
prediction using the identified linear model. The two curves do not match as expected. 
However, when the same coefficients are used in the predictor model, then the predicted 
response follows closely the actual response as shown in Fig. 18. It is precisely this 
potential ability of the linear predictor model that makes it applicable to the control of non- 
linear systems. Furthermore, note that the coefficients are time-invariant. Allowing these 
coefficients to be time-varying as in the case of model reference adaptive control can further 
enhance the applicability of such models for non-linear systems. 




Figure 17: Open-loop prediction, 
(non-linear system) 


Figure 1 8: One-step ahead prediction, 
(non-linear system) 


8.4. Model Reference Adaptive Control of a Non-Linear System. 
Finally, this example illustrates the application of model reference adaptive control to the 
non-linear system considered in the above section. For comparison purpose, the same 
adaptive control law is used as in the linear case, except that it is now applied to the non- 
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linear system. Figures 18a-d show the tracking response, the prediction error, the tracking 
error, and the control input time histories, respectively. Recall that the control method does 
not require that the open-loop model be identified, but rather the predictor model that can 
reasonably predict the response, which is the case illustrated in the previous example. 



Figure 1 8a: Tracking response, 
(non-linear system) 



Figure 18b: Prediction error. 



Control 

Input 



Figure 18c: Tracking error. 


Figure 18b: Control input. 


When disturbance and noise are added to the system, the resulting behavior of the 
system is shown in Figs. 19a-d. Again, this reveals a certain degree of stability robustness 
of the adaptive scheme to possible disturbance and noises. This is due to the inherent 
robustness in the ability of linear predictors that can predict the non-linear response. 
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Figure 19a: Tracking response with Figure 19b: Prediction error, 

dist. and noise present (non-linear syst.) 
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Figure 19c: Tracking error. 



Figure 19b: Control input. 


8. Summary and Concluding Remarks 

This paper presents the basic concepts of the neural networks as related to the 
problem of modelling and control of a dynamic system. Two basic forms of the neural 
networks, the feedforward network and the recurrent network, are discussed. Emphasis is 
placed on the interpretation of the neural networks in terms of standard linear system theory 
so that better insight may be gained when these concepts are applied in practice. 
Relationship between the feedforward neural network and the state space model and 
between the recurrent network and the observer model is explained. To identify a linear 
system, the discussion in this paper reveals that it is neither advantageous nor necessary to 
use a multi-layer network, but rather a single layer of linear neurons is adequate. The 
resultant simplified network is then equivalent to standard regressive models that are often 
used in adaptive systems theory. The real benefit of a neural network in system 
identification is in its capacity to capture non-linearities, in which case the neurons must be 
non-linear. With respect to the control of both linear and non-linear systems, however, it is 
shown that it is not the identification of the open-loop system that governs the stability of 
the tracking behavior, but rather the ability of a mechanism that can predict future response 
based on actual available input-output data. It is shown that this mechanism can often be 
provided simply by a linear predictor. A linear predictor, consisting of a single layer of 
linear neurons, is in fact an optimal choice for a linear system. The same linear predictor 
can often be adequate for non-linear systems as well, making it directly applicable to the 
control a non-linear system. The resulting control technique is simply model reference 
adaptive control, a well-known technique in linear system control. Such a linear predictor 
can be easily determined from input-output data. If implemented on-line, the method can 
also adapt to changing dynamics. The recent advent of the Observer/Kalman filter 
identification (OKID) method has motivated the design of controllers that are based directly 
on the observer Markov parameters. This paper shows that one such design is model 
reference control because the observer Markov parameters are precisely the coefficients of 
an optimal linear predictor. The design takes advantage of the ability of the predictor to 
handle certain non-linear systems, an often observed fact in practical implementation of the 
OKID method. 

To make adaptive control truly useful in practice, constraints with respect to 
sampling and computation speeds must be addressed. Naturally, the adaptive scheme 
places heavy emphasis on on-line measurements rather than some known model of the 
system for control. Sensor failure, therefore, becomes an important issue. Practical 
consideration may dictate a compromise between fixed-gain and adaptive control, thus 
requiring a mechanism to determine when adaptation should take place. The question of 
sensor placement and sensor selection are also important ones. This is to avoid the 
situation where the system can produce bounded output when driven by unbounded input. 
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This case requires additional theoretical treatment than that presented in this paper. Finally, 

the paper concerned mostly with stability rather than performance robustness issues. 

Further work is required to assess this aspect of the problem. 
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